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Abstract Body 



Background / Context: 

A renewed effort to insure that publicly funded and collected data remains accessible to 
researchers has recently gained governmental and academic interests (Council on Governmental 
Relations, 2006). Organizations such as the Inter-University Consortium for Political and Social 
Research (ICPSR) have long archived and collected large data sets, and the National Institute of 
Health (NIH), and the National Science Foundation (NSF) both have formal requirements for 
grantees around plans for sharing and archiving data. These databases have the potential to 
enable advanced analysis for both policy and practice. 

While these data archives provide researchers the opportunity to perform secondary 
analyses, they also engender the opportunity for new methods of meta-analysis. In medicine, 
where individual patient data is more commonly available than in the social sciences, 
methodologists have outlined a number of methods for combining individual participant data 
with the more traditional aggregated data usually collected in a meta-analysis. The purpose of 
this presentation is to illustrate methods of meta-analysis that combine both individual 
participant data (IPD) and aggregated data (AD) from traditional meta-analyses. Our example is 
based on an on-going project that uses data from Greenwald, Hedges, and Laine’s (1996) meta- 
analysis of 60 primary research studies that synthesized aggregated data on education production 
functions. At least six of the studies included in this meta-analysis used data from publicly 
available data sets. The presentation will compare the results from traditional aggregated data 
meta-analysis with a range of methods that incorporate both aggregated and individual level data. 

Cooper & Patall (2009) recently outlined the benefits and limitations of IPD meta- 
analysis for issues in the social sciences. The advantages of incorporating individual participant 
data include but are not limited to: 

• Increased collaboration across researchers: As mentioned earlier, the National Science 
Foundation and the National Institutes of Health both have developed policies for data 
sharing. The National Institutes of Health (2003) statement on sharing research data 
indicates that all applications with direct costs above $500,000 must address data sharing. 
Curran & Hussong (2009) and Shrout (2009) both provide examples of collaborations 
that have been developed around the pooled data sets. 

• Obtaining missing data and checking original analyses: One advantage Cooper & Patall 
(2009) cite for IPD is the ability to check the original data from the primary studies, and 
to fit models that were not possible with only the data provided in the studies. For 
example, the primary data set may include outcome measures or characteristics of 
participants not reported in the original study. The problem of outcome reporting bias 
has been discussed by Orwin & Cordray (1985) in the social science literature, and is a 
source of considerable discussion in medicine (Turner, Matthews, Linardatos, Tell, & 
Rosenthal, 2008; Vedula, Bero, Scherer, & Dickersin, 2009). Missing data is also a 
problem in aggregated data analysis (Pigott, 2009) when particular moderators of effect 
size are not all reported across studies or when information to compute an effect size is 
not present. With the original data, effect sizes can be computed with full information, 
and analyses of effect size variation can use more detailed background characteristics of 
the study and participants. 
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• Increased statistical power: Another potential benefit of IPD meta-analysis is in statistieal 
power. Simmonds & Higgins (2007) eompare the power for deteeting interaetions 
among study level charaeteristies and effeet sizes in an AD meta-analysis versus an IPD 
meta-analysis. Under many eonditions, an IPD meta-analysis has greater power than an 
AD meta-analysis. These eonditions depend on the variation in the variables that are 
potential moderators of study effeet both within and between studies. 

• Examining differential effeets while minimizing aggregation bias: As researehers 
developing multi-level modeling have long stressed, aggregation bias operates within 
nested educational data (Raudenbush & Bryk, 2002) and should be carefully monitored in 
eonelusions of AD meta-analysis (Cooper & Patall, 2009; Sehmid et ah, 2004). Having 
the individual partieipant data allows the examination of differenees in treatment 
effeetiveness at the level of the individual rather than at the level of the study. Being able 
to make inferenees at the individual partieipant level not only avoids aggregation bias, it 
may lead to inferences for a meta-analysis that are more readily applied to practiee. 

• Broadening the psyehometrie evaluation of eonstruets: As illustrated by Bauer & 

Hussong (2009), pooled individual partieipant data provides the opportunity to examine 
the psyehometrie properties of measures used aeross studies, and in some cases to 
develop eommensurate measures aeross studies. In AD meta-analysis, by neeessity we 
assume that measures of a similar eonstruet are eomparable, but without the individual 
level data and information about the measures’ psyehometrie properties, we eannot be 
sure if these measures share similar properties. Under certain eonditions, say when 
studies share some items in eommon, measures of a eonstruet eould be linked aeross 
studies using item response theory. Comparing aeross measures may also lead to the 
development of more sensitive assessments that ean be shared aeross studies. 

• Allowing more eomplex analyses of the primary data: Much of the researeh on meta- 
analysis methodology in the soeial seienees foeuses on methods for eombining results of 
eomplex statistieal analyses aeross studies. For example, many reviewers are foreed to 
exclude studies that report regression results sinee we do not have methods for combining 
across different regression models. With individual level data, problems with eombining 
aeross different regression models eould be alleviated by estimating similar models 
aeross studies with the original data. 

Purpose / Objective / Research Question / Focus of Study: 

The foeus and purpose of this researeh is to examine the benefits, limitations, and 
implieations of Individual Partieipant Data (IPD) meta-analysis in edueation. Comprehensive 
researeh reviews in education have been limited to the use of aggregated data (AD) meta- 
analysis, teehniques based on quantitatively combining information from studies on the same 
topie. These analyses have obvious benefits, but ean at times be limiting. 

The proposed projeet will eonduet an IPD meta- analysis on studies foeused on estimating 
an edueation production function. Our research goal is to understand the benefits and limitations 
of IPD meta-analysis eompared to AD meta- analysis. More speeifically, our researeh questions 
are the following: 

1 . What are the methods suggested in the literature for eondueting a meta-analysis that 
combines both aggregated data and individual partieipant data? 

2. How do the results of the original analysis eompare with the meta- analysis using both 
aggregated and individual data? 
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3. What are potential benefits and limitations of the use of IPD meta-analysis in the 
soeial sciences? 

Significance / Novelty of study: 

Although medical researchers have conducted IPD meta-analyses, little attention has 
been paid to its use in education or psychology (Curran & Hussong, 2009). Cooper & Patall 
(2009) discussed the benefits and limitations in general for IPD versus AD meta-analysis, but did 
not conduct an analysis. In psychology, a special issue of Psychological Methods is devoted to 
issues related to Integrated Data Analysis (Bauer & Hussong, 2009; Cooper & Patall, 2009; 
Curran, 2009; Curran & Hussong, 2009; Hofer & Piccinin, 2009; Shrout, 2009). To date, the 
only meta-analysis in education utilizing individual-level data was conducted by Goldstein, 

Yang, Omar, Turner & Thompson (2000) combining data from studies of the effects of class size 
with primary data from the Tennessee STAR experiment. This study will utilize 3-4 datasets 
initially utilized in studies that are included in the Greenwald, Hedges, and Laine (1996) meta- 
analysis to compare the findings of these two methods and assess the feasibility and cost of an 
IPD analysis. 



Statistical, Measurement, or Econometric Model: 

The data analysis will compare two strategies for analyzing a mix of individual level and 
aggregated level data. The one-stage method is based on multi-level modeling techniques, while 
the two-stage method first obtains the aggregated data for all studies (either from IPD or from 
study reports), and then synthesizes the results. In order to present the models that will be used, 
we begin with a brief outline of random effects models for aggregated data meta-analysis, and 
then develop the models that will be used in this project. 

Random effects model of effect size with aggregated data. The typical random effects 
model of effect size can be written using a two-level hierarchical model as outlined by 
Raudenbush (2009). We compute an effect size T. from study i, where i= and also the 



variance of that effect size, y , using statistics for that study. Level 1 is given as 



71=0.. + 



e,., e.~r7(0,y) 



1 



In AD meta-analysis, we assume that y is known. Level 2 is given as 

=e + u^,u.~r]{Q,ol), 2 

where 6 is the overall mean effect size, and the variance component is given as ol . The random 
effects variance can be estimated either directly using the method of moments or with restricted 
maximum likelihood as suggested by Raudenbush (2009). 

Random effects model for IPD. In order to illustrate the one- and two-stage methods, 
we need a model for the data from a study that provides individual participant data. The 
outcome for IPD is the individual participant’s response on the target measurement, denoted for 
participant j, in study i, as . Note that in AD meta-analysis, we use summary statistics from 

the primary reports of research to compute an effect size for each study. In order to make the 
outcomes parallel in an IPD analysis with an AD analysis, we will use the standardized outcome, 
denoted here by y,.. for student 7,7 = 1,..., rij,m study i, z= 1 , . . . ,k. Thus, each students’ outcome 

will be standardized using the overall mean and standard deviation of the outcome observed in 
that study. We do this so that we can synthesize study outcomes that are not using the same 
measure of a construct. We can write a hierarchical linear or mixed model for our IPD data 
following Riley et al. (2008). The model is given as 
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yij=(!>i+diXij + e,j 

Oj = 6 + Uj 

^ij ~r](0, 1 ) 

where Xy is a 0/1 eode designating eontrol or treatment group membership, the fixed study effeet 
is , with the random treatment effeet in study i given by 6. . The varianee within eaeh study for 
the outeome is 1 , sinee we have standardized our outeomes, and the varianee for the 6. is . 

Our goal in an IPD meta-analysis would be to estimate the mean treatment effect, 6 ^ and its 
variance, , using standard methods of hierarchical linear models (Raudenbush & Bryk, 

2002). 

Two-stage method with hoth IPD and AD. As Riley et al. (2008) find, the easiest 
method to employ with a mix of IPD and AD is a two-stage model. The researcher first 
computes the study level effect sizes from each IPD study, and then continues with estimating 
the random effects model given in Eqn. 1 . 

One-stage method with hoth IPD and AD. One-stage methods for a mix of IPD and 
AD would be analogous to fitting hierarchical linear models when some of the level-2 units do 
not provide individual level data. In Riley et al.’s formulation, we assume that AD studies 
provide their effect size estimate, T. = 6., and its variance y , which is known. For studies that 
contribute individual level data, our model is given in Eqn. 5 where y^j is our standardized 

outcome for student j in study i. In the combined IPD and AD model, studies with IPD 
contribute individual student outcome data, while the AD studies are assumed to have only one 
student with an outcome equal to 0, and residual variance known and equal to y , the variance of 
the particular effect size used in the analysis. Following Goldstein et al. (2000), Riley et al. add a 
dummy code, D. , that takes the value 1 for IPD studies, and 0 for AD studies. The dummy code 

allows the estimation of the treatment effect 6. for studies with IPD, and both AD and IPD 
studies to contribute to the estimation of the average treatment effect, 6 , and the between-study 
variance in the treatment effect, ol . Riley et al. give the model as 

yl = D^(t>.+ 9, x,j+ e* 

6: = 6 + U: 

4 

Ui 

As stated above, for each IPD study, the outcome is y*. = y^.. and the v* = 1 since we have 

standardized our outcomes within studies. In the AD studies, we assume only one observation 
(/=!), and we set y , =1 . The response in the AD studies is the estimate of the effect size in that 

study, y* = 6 . , and variance y. . 
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Usefulness / Applicability of Method: 

The usefulness and applieability of this method is dependent on the availability of 
primary data. A traditional AD ean produee results strietly from primary studies, the usefulness 
and power of an IPD requires a number of datasets. Although this ean be a limitation, we believe 
the eall to aetion set forth by the NSF (among others) to disseminate primary data and the 
availability of eommunieation ease (i.e. the internet) will only increase the applicability of these 
methods. 

Data Collection and Analysis: 

Original findings, as mentioned previously, derive from Greenwald, Hedges, and Laine 
(1996). GHL synthesized 60 primary studies, of which 36 utilized a large-scale dataset. 
Unfortunately many of these studies were conducted prior to the use of computers (many were 
conducted in the 1970’s and 1980’s) and therefore data are not available. However, we have 
obtained three datasets and in the process of acquiring a fourth. These include: 

• Equal Education Opportunity Survey 

• Project Talent 

• High School and Beyond 

• Working to obtain a dataset from Illinois, Kentucky, and California 

All datasets include a measure of student achievement and at least one measure of (or proxy to) 
per-pupil expenditure. 

Conclusions: 

The methods proposed have the potential to further meta-analytic techniques. Although 
the “traditional” aggregated data analysis will remain paramount, the IPD analysis will engender 
and enable more sophisticated and precise estimates of treatment effects. Of course, data sharing 
limitations, data accessibility, and time restrictions potentially limit the capabilities of IPD, the 
ability to test multiple variables, the elimination of aggregation bias, and increased statistical 
power embolden IPD’s prowess. Indeed IPD may be the future of meta- analysis and our 
proposal provides the opportunity to further the method’s research. 
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